Project description

We are studying the effect of an inhibitor of the cGAS-STING signaling pathway H151 on T-ALL model cell line Jurkat.

The biological experiment revealed that H151 causes cell death, so the pathway is important for survival of T-ALL.

We want to explore the differential expressed genes between normal condition and by inhibiting the pathway. Thus we are going to perform a differential expression analysis (DEA) followed by a pathway enrichment analysis (PEA)

Following a basic RNA-seq pipeline analysis

Importing data

Raw counts

Metadata

Experimental data

Data exploration

First let’s view the distribution of the different bio types we have in our data :

In the downstream analysis (DEA), we’ll be focusing on the top 2 biotypes (protein_coding and lncRNA). Additional filtering will be applied :

  • MaxCount_threshold = 20 (At least 1 sample must have a read count over that value)
  • CpmCount_threshold = 0.5 (Count per million reads threshold)
  • MinSample = 3 (Samples that should pass the cpm threshold)



Now to understand the global gene expression landscape and to assess the quality control of our data, we need to perform a dimentionality reduction analysis : Principal component analysis (PCA)

The PCA :

  • Assumes homoscedasticity (equal variance) between the samples
  • Assumes roughly normal distribution

So we need to perform some data transformation first on raw counts.

We’ll be using a Variance Stabilizing Transformation (VST) from the DESeq2 package. And This will :

  • Normalize for library size (DESeq2 computes size factors using a median-of-ratios method)
  • Use a negative binomial model to estimate dispersion
  • Apply a variance-stabilizing mathematical transformation

Differential expression analysis (DEA)

Now let’s dive into gene expression.

dds <- pre_process_results$DESeqData
dds$Condition <- relevel(dds$Condition, ref = "Jurkat_ct") ## set the control
dds <- DESeq(dds) 

Let’s get a quick overview of the distribution of the DEGs